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Abstract 

> 

Background: The study of genome-scale metabolic models and their underlying networks is one of the 
j- — most important fields in systems biology. The complexity of these models and their description makes the use 

of computational tools an essential element in their research. Therefore there is a strong need of efficient and 
versatile computational tools for the research in this area. 

Results: In this manuscript we present PyNetMet, a Python library of tools to work with networks and 
^ metabolic models. These are open-source free tools for use in a Python platform, which adds considerably 

versatility to them when compared with their desktop software similars. On the other hand these tools allow one 
to work with different standards of metabolic models (OptGene and SBML) and the fact that they are programmed 
in Python opens the possibility of efficient integration with any other already existing Python tool. 

Conclusions: PyNetMet is, therefore, a collection of computational tools that will facilitate the research 
work with metabolic models and networks. 
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1 Background 

Nowadays, the genome-scale reconstruction of metabolic models has become one of the corner stones of 
systems biology. Reconstructed metabolic models have been used in a wide range of applications, such as 
the study of metabolism regulation and operations [Tj-[4] , determination of the optimal conditions for the 
growth and prediction of maximum yield of biomass for a determined organism [5] , the search of potential sites 
for metabolic engineering [6] , the production of biofuels [7j|9] and even in the reconstruction of phylogenetic 
trees [10] . One of the most important computational tools for the analysis of metabolic models is the flux 
balance analysis (FBA) [II], which consists basically in the determination of a possible consistent solution 
for all fluxes in the reactions of the model that optimizes some given objective. 

A particular way to study genome-scale metabolic models is to analyze their underling networks. The 
simplest example of such network is to define each metabolite present in a metabolism as a network node, 
and assign connections in between the nodes based on the connection of the respective metabolites in the 



metabolism through chemical reactions. Such networks have been widely studied in the literature 12 ■ 14 



Typical genome-scale metabolic models comprise around thousand different metabolites and chemical 
reactions and, correspondingly, the underlying metabolic networks are complex structures with around one 
thousand interconnected nodes. The analysis of these complex structures would be nearly unfeasible without 
the aid of modern computers. There are different available software for performing FBA on a metabolic model 



like the COBRA toolbox, originally developed for MatLab 15 , but now also available for Python, or the 
OptFlux software [16], among others and also software for the analysis of networks. All these software 
have drawbacks. For instance, there are two different standards for the storage of metabolic models: the 



SBML 17 and OptGene (also known as BioOpt) [18] formats, and the available software cither use one or 
another, but not both. On the other hand, some software are not free (like MatLab) or are desktop software 
which limits their uses. 

In this article we present a series of tools, which have been developed in Python, for dealing with 
chemical reactions and analyzing networks and metabolic models. Python is a free, open-source, modular, 
object oriented programming language |19| . Open-source libraries boost the development and advance of 
bioinformatics by allowing developers and researchers to develop new tools and applications over modules 
already built. Moreover, modular programming languages like Python allow easy and efficient integration of 
its modules with other libraries and software (which is hardly done with desktop applications). Python has 
also available the Biopython package which already contains various standards used in bioinformatics and 
allows the direct connection with different biological databases. 
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The package present here is called PyNetMet (from Python Network Metabolism), it comprises four 
classes called Enzyme, Network, Metabolism and FBA. The Enzyme class defines a new object called Enzyme 
which stores a chemical reaction. The methods in this class will be thoroughly used in by the class 
Metabolism in order to organize and extract information from the list of chemical reactions that define 
any particular metabolic model. The class Network provides tools to study any graph defined by intercon- 
nected nodes. Several classical graph theory algorithms are programmed inside this class, like the Dijkstra's 
algorithm [20] for calculating the shortest mean distance between the nodes of the network or paths connect- 
ing any two nodes, the Kruskal algorithm |21| to organize the nodes according to their clustering similitude, 
and others. The Metabolism class has basically two functions. First, it works as a parser from Opt- 
Gene or SBML file formats, which can store metabolic models and the parameters needed for flux balance 
analysis (FBA). Secondly, it extracts and resumes information from the metabolism, allowing one to find 
disconnected components in the model, reactions that cannot contribute to the simulation of an organism's 
metabolism (FBA), among other tools. Finally the class FBA allows one to perform an FBA of the model, 
and apply other algorithms to search for essential reactions or calculate the sensitivity of the objective flux 
with respect to the flux in any reaction. PyNetMet can be downloaded from the Python Package Index 
(pypi.python.org/ pypi /PyNetMet j\ .0) . 



2 Implementation 

The package PyNetMet consists of four classes: Enzyme, Network, Metabolism and FBA, all fully programed 
in Python 2.7 language. The class Enzyme has no dependencies, it defines a new type of variable which stores 
a single chemical reaction. Class Network has a single dependency (for two specific functions) which is the 
Python Imaging Package (PIL), for making plots representing the clustering of nodes. The Class Metabolism 
depends on the classes Enzyme and Network, and class FBA depends on the class Metabolism and on the 
Python library Pyglpk (which contains tools for solving the associated optimization linear problem). 



In Tables 4.2 and 4.3 we list all attributes and methods (not the underscore ones), respectively, of the 
classes with a short description. For a more complete description and more detailed examples of use, please 
refer to the manual that accompanies the PyNetMet distribution and is available here as additional file 1. 
In what follows in this section, we present each class separately commenting on some important aspects and 
definitions. 

To use each class, one only has to import it as a Python module. A few examples will be given. 
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2.1 Enzyme 

The class Enzyme defines an chemical reaction object. It will be the main object used to build the Metabolism 
object later on. Its obligatory input is a string containing the reaction. This string must be a reaction written 
in OptGene format, so it should have two parts separated by " : " , on the left of the two points should be 
the name of the chemical reaction, and to its right the reaction in the form " a A + b B + ... -> c C + d 
D + ..." where the low case letter represent numbers for the stoichiometric coefficients and the upper case 
letters are metabolites (molecules) names. The " -> " can be substituted by " <-> " in the case when the 
reaction is reversible. When defining the object one can also give an optional input, also a string, which will 
be used to indicate the pathway name of a particular reaction. For the metabolite names one can use spaces, 
but the Enzyme class will remove these spaces from the names. One can also use symbols, like "+", etc, 
but being careful not to confuse them with the "+" sign indicating the interaction of different molecules. 
Example, 

>>> from PyNetMet . enzyme import * 

»> enzl = Enzyme ("read : A + 2 B -> C + D") 

in this example one defines the variable enzl which contains the reaction whose name is read, where one 
molecule of A combines with two molecules of B to result in one molecule of C and one of D. One should 
always put spaces surrounding the ":" , "->" and "+" signs, otherwise the symbols might get confused with 
the metabolite names. 

The representation of an enzyme object will be its initial string, but with numbers transformed to float 
type. Note the following examples: 

>>> print enzl 

read : 1.0 i + 2.0 B -> 1.0 C + 1.0 D 
»> enz2 = Enzyme ("reac2 : A + 2 B -> C D") 
>>> print enz2 

reac2 : 1.0A+2.0B -> 1.0 CD 

»> enz3 = Enzyme ("reac3 : A + 2 B -> C+ D") 

>>> print enz3 

reac3 : 1.0A+2.0B -> 1.0 C+D 

»> enz4 = Enzyme ("reac4 : A + 2 B -> C+ + D") 

>>> print enz4 
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reac3 : 1.0 A + 2.0 B -> 1.0 C+ + 1.0 D 



from these examples one should note that it is possible to use spaces in the metabolite names, but the Enzyme 
class will remove these spaces from the names. One can also use symbols like "+" or "-" . 



Apart from the methods in Table 4.3 the class Enzyme has also a few underscore methods programmed: 
the __add__, __sub__, __rmul__ and others, which define how Enzyme objects can be summed, subtracted, 
multiplied by constants, etc. The result of these mathematical operations with chemical reactions is rather 
intuitive and one can refer to the manual for a few examples. 



2.2 Network 

The Network class defines a graph and contains many algorithms for its analysis. It should be initiated with 
one obligatory input and an optional one. The input for this class is the TV x TV adjacency matrix for the 
network (TV is the number of nodes in the network) and the optional one a list with the node's names, if this 
is not given the nodes will be named with numbers from to N — 1. The adjacency matrix, M, is a list of TV 
elements, where each element is a list with TV elements, each element being or 1. If M[i][j] is 1, it means 
that node i has a directed connection to node j. If the M matrix is symmetric, the network is undirected, 
meaning there is no distinction between a link from node i to j or from node j to i. Otherwise, the network 
is interpreted as a directed graph, where the connections have an incoming and outgoing node. 

From the mathematical point of view, a network is defined by a list of nodes and a list of edges. In order 
to represent such an object we started from the adjacency matrix M. This is the TV x TV matrix with zeros 
and ones, where TV is the number of nodes in the network and a directed edge exist coming from node i to 
node j if the element Mij of the matrix is 1. In this case of an undirected network, the total number of 
edges is half the total number of ones in the M matrix. Another way to define a network is with a list of 
TV elements where each element of this list is a list of neighbors for a given node. In our class three such 
lists are created with the input M: linksin, linksout and neigbs for the list of incoming edges to a given 
node, outgoing ones and the total number of edges disregarding the directionality, respectively. 

One can define a few attributes for each node. First, the node's degree is the number of connections it 
has to other nodes. In its calculation we do not consider the directionality of the connection, so what the 
class actually calculate in this attribute (kis) is the length of each element in the list neigbs. 

Another attribute of a node is its clustering coefficient. It is defined by: 
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where fcj is the degree of node i, and £7, is the number of connections between the neighbors of node i. The 
average clustering of a network can be calculated straightforward: 

>>> Cbar = sum(net .Cis)/net .nnodes 

given that the variable net is a Network object. 



Next, we define the topological overlap (Oij) between two nodes according to 14 



1 , if i connected to j 
, , otherwise , , 

Oij = " — f j 2 

mm{ni,nj) 



C'ij 



where is the number of common neighbors between nodes i and j and xain(rii,nj) is the minimum 
between the number of neighbors of nodes i and j. 



In 14 a method for grouping the nodes in clusters is proposed basically by constructing a dendrogram 
(tree) with the values of the topological-overlap (Oij). This tree can be constructed with the Kruskal 
algorithm, which is implemented in the Network class. Another interesting method for ordering the nodes is 
proposed in |22| . Although this later method has many improvements with respect to the dendrogram one, 
it is based on a Monte-Carlo simulation and is computationally very costly. Here we propose yet a different 
method which is computationally more efficient and returns results at least as good as the dendrogram 
method. 

The objective of the method is to reorder the nodes in the adjacency matrix (or the topological overlap 
one), such that nodes close to each other are correlated in the sense that they share neighbors which are 
also correlated among them, obtaining in this way an ordering where nodes belonging to common clusters 
are nearby each other. The algorithm follows the following steps: 

(1) Choose any node i to start with. Add it to the ordering. 

(2) From node i, find the node j for which xfj defined below is minimum: 

x?. = y 1 ( Q»-o» V (3) 

^ ^ max(0.00001, C k ) \O lk + O jk ) V ' 

where E' is the set of all nodes that have not been added to the ordering. 
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(3) Add node j to the ordering. 

(4) Set node j as i and repeat the process from step (2) until the set E' is empty. 

The use of the function max(0. 00001, Cfe) is to avoid a division by zero in the case that node k has 
clustering coefficient. This algorithm is implemented in the Network method plot jriCCs. In the next section 
plots obtained from this method and with the dendrogram one for real metabolic models are shown. 

2.3 Metabolism 

This class defines an object with a full metabolic model. The metabolic model can be given as input in three 
different ways. By default one can use a single input which is a string containing the file name (with path) 
of a metabolic model in OptGene format, alternatively, one can use a file in SBML format and finally one 
can define lists containing reactions, constraints, external metabolites and objective function directly from 
the command line and use them as input for the class. So, this class works either as a parser for OptGene 
or SBML file formats or as a platform to construct new metabolic models from the beginning. 

This class has also the dump method, that allows one to write an output file with the stored model either 
in OptGene or SBML file formats. This resource allows the class to be used as a translator between OptGene 
and SBML file formats, for one can load the model in one format and dump it in the other format. 

The main attribute from this class is its enzymes list, which contains all chemical reactions in the model. 
This list can be altered either directly (which is not advisable since other attributes of the class will not be 
automatically updated unless one calls the calcs method afterward), or by making use of the bad_reacs, 
add_reacs and pop methods. 

The use of this class together with the Network and FBA classes offers rich resources for an extensive 
analysis of any metabolic model. 

2.4 FBA 

The FBA class offers tools for performing flux simulations and analysis of a metabolic model. It has methods 
defined which are based on the FBA for studying essential reactions, sensibility of the objective function 
with respect to any given reaction, comparison of different realizations of the FBA, among others. 

To call this class one must give one obligatory input, which is a Metabolism object with a metabolic 
model. It can also receive two optional inputs with are the precision (eps, value under which a flux is 
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considered zero, by default it is set to 10 -10 ) and a choice of maximizing or minimizing the objective (the 
default choice is maximize). 

This class has one underscore method, the __sub__ which defines the subtraction of two FBAs. This 
method actually compares two different FBA outputs, returning a string with four columns, the first with 
the name of each reaction, the second and third with the flux of the reaction in each one of the FBAs and 
the fourth one with the relative difference in the fluxes (100% Vl ~" 2 ), in the case where the first flux is zero 
and the second is not, it returns the string "NA" in this column. 



3 Results and Discussion 

In this section we exemplify some uses of our tools by analyzing real metabolic models taken from the 
literature. 

We chose three models to work with, the first is the iSyn.811 model of Synechocystis sp PCC. 6803 [9]. The 
second is the metabolic model iCM925 for the organism Clostridium beijerinckii NCIMB 8052 [23] and last 



is the model iAK692 for Spirulina platensis CI from 24 . This last model comes in three different versions 



we are using the first one. All these models are available from the journals as supplementary materials, the 
first one in OptGene format and the other two in SBML format. These have been downloaded and saved in 
a working folder and can be directly accessed by PyNetMet tools. 

>>> from PyNetMet .metabolism import * 

»> syn=Metabolism("iSyn811 .txt") 

»> cbe=Metabolism( " iCM925 . xml " , f iletype=" sbml" ) 

»> ak=Metabolism ( " iAK692 . xml " , f iletype= " sbml " ) 

One might notice a discrepancy in the number of reactions and metabolites between the model reported 
in the literature and the one loaded by the tools. This is because the Metabolism class adds to the model 
transport reactions that are needed to perform the FBA. These added reactions are included in a pathway 
named .TRANSPORT.. The number of reactions in this pathway should equal the difference between the 
numbers reported in the literature and the actual number of reactions and metabolites in the loaded model. 

>>> print cbe 

# Reactions : 957 

# Metabolites: 900 

>>> print cbe .pathnames 
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['GPW, ' _TRANSPORT_ ' ] 

>>> print len(cbe .pathways [1] ) 

19 

The iCM925 model, for instance, is reported to have 938 reactions and 881 metabolites, while the object 
cbe has 957 reactions and 900 metabolites. The difference (19) is the number of transport reactions in the 
.TRANSPORT, pathway. 

First, let's plot a representation of the topological overlap of the nodes. For each model we make three 
plots, the first with the arbitrary order in which the nodes appear in the model, then with the nodes ordered 
by the Kruskal algorithm and finally with the algorithm described in subsection 2.2. 

>>> syn.net .plot_matr (syn.net .nCCs , range ( syn. nmet s) , output="plot_synl . jpg") 
>>> syn.net .kruskal (syn. net .nCCs, minimo=False) 

>>> syn.net .plot_matr (syn.net .nCCs , syn.net.krusk_ord, output="plot_syn2. jpg") 
>>> syn.net .plot_nCCs (output="plot_syn3 . jpg" ) 

>>> cbe .net .plot_matr (cbe .net .nCCs , range (cbe. nmet s) , output="plot_cbel . jpg") 
>>> cbe .net .kruskal (cbe . net .nCCs , minimo=False) 

>>> cbe .net .plot_matr (cbe .net .nCCs , cbe.net.krusk_ord, output="plot_cbe2. jpg") 
>>> cbe .net .plot_nCCs (output="plot_cbe3 . jpg" ) 

>>> ak.net .plot_matr(ak. net .nCCs , range(ak.nmets) , output="plot_akl . jpg") 
>>> ak.net. kruskal (ak. net. nCCs, minimo=False) 

>>> ak.net .plot_matr (ak. net .nCCs , ak.net .krusk_ord, output="plot_ak2 . jpg" ) 
>>> ak . net . plot_nCCs (output="plot_ak3 . jpg" ) 

These commands should produce the nine plots shown in figure |4~T) 
The average clustering for each network can easily be obtained: 

>>> print sum(syn.net . Cis) /syn. net .nnodes 
0.16599889162 

>>> print sum(cbe .net . Cis) /cbe .net .nnodes 
0.24542198734 

>>> print sum(ak. net . Cis) /ak. net .nnodes 
0.199707555775 
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Other interesting analysis that can be made using the methods from the Network class are the search 
for disconnected components or the study of paths between the nodes of the metabolic network. Once the 
method components is called for a network, apart from the disc_comps attribute, it automatically creates 
two new attributes, dists and paths that contain the shortest distances and paths in between any two 
nodes of the network. 

»> syn.net . components () 

»> print [len(ele) for ele in syn.net .disc_comps] 
[976, 2, 5, 2, 2, 2] 

The attribute disc_comps is a list with the list of nodes in each disconnected component of the network. 
In the above example we printed the number of nodes in each component, which shows us the giant com- 
ponent (976 metabolites) that comprises the metabolism, and 5 other components which are the result of 
reactions disconnected from the main metabolism and that could be removed from the metabolic model. The 
metabolism method bad_reacs removes these reactions and also reactions where one product and one sub- 
strate only appear once in the whole metabolism, indicating that these reactions are also poorly connected 
to the main component. 
For the other networks: 

»> cbe .net . components () 

»> print [len(ele) for ele in cbe .net .disc_comps] 
[898, 2] 

>>> ak.net . components () 

>>> print [len(ele) for ele in ak.net .disc_comps] 
£797 j 5j 3^ 3j 3j 2 s 2 } 3 ? 3j 3 ( 3 ? 3 j 3 ( 3j 3] 

In the above examples the network under study is the one composed only by the metabolites in each 
metabolic model. One can chose to work with the bipartite network formed by metabolites and reactions. 
In the following examples we build this network in order to study paths between metabolites. 

»> import sys # For setting a new recursion limit for recursive functions 

»> sys. setrecursionlimit (10000) 

>>> [Mreac , names] = syn.M_matrix_reacs() 

>>> net = Network (Mreac , names) 
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»> iglu = names . index ("alpha-D-glucose") 
>>> ipyr = names . index ("pyruvate") 

»> [paths, dists] = net . calc_dist_wp(net . linksout , iglu) 
>>> print [names [ii] for ii in paths [ipyr] ] 

['alpha-D-glucose', '2.7.1.2b', ' ADP ' , '2.7.1.40a', 'pyruvate'] 
>>> print paths [ipyr] 
[4, 990, 3, 1003, 21] 

In this example we calculated the shortest path from glucose to pyruvate: it goes through reaction 2.7.1.2b, 
which has ADP as product and ADP is substrate in reaction 2.7.1.40a that produces pyruvate. Note that 
if one prints paths [ipyr] the numbers that one sees are [4, 990, 3, 1003, 21]. The numbers 3, 4 and 21 
correspond to the positions of ADP, alpha-D-glucose and pyruvate in the list syf .metabol, but the numbers 
that correspond to reactions 2.7.1.2b and 2.7.1.40a in the list syf .enzymes are not 990 and 1003, but instead 
1 (990-syf .nmets) and 14 (1003-syf .nmets). 

Metabolites that could not be reached from glucose are marked with the symbol "X" in the dists list: 

»> dists. count ("X") 
221 

»> ndist = f ilterdambda x:x != "X", dists) 

>>> print 1 . *sum (ndist) /len(ndist) 

5.78828081813 

>>> print max(ndist) 

20 

»> dists. count (20) 
1 

»> names [dists . index (20)] 
' Astxbm' 

Here we see that 221 metabolites could not be reached from glucose. From those that could be reached, the 
average shortest path is around 5.788 and the furthest metabolite reached by glucose is Astxbm which is 20 
nodes away. 

Apart from the network analysis of the models, one can use the methods in class FBA. Just by calling the 
class with a metabolic model as input one can directly obtain the FBA result by printing the class object. 
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The result is a string listing the reaction names and their respective fluxes ordered, by default, according to 
these fluxes. 

»> from PyNetMet.fba import * 
»> fba_cbe=FBA(cbe) 
»> print fba_cbe 

< . . . lots of output ... > 



R_GAPD > 

R_FDXNH > 

R_H2ex > 

R_ex_h2_e > 



5.012953336 
6.963928797 
6.963928797 
6.963928797 



Flux on objective : 0.119427648 

Reactions with flux (flux>eps) : 284 

Reactions without flux (flux<eps) : 673 
Solution status: Optimal 

The FBA objects have the method __sub__ defined, which allows a comparison between two realizations of 
a FBA. As an example on how it works, let's compare the metabolism of Synechocystis sp PCC. 6803 when 
optimizing its growth and when optimizing hydrogen production for a fixed value of growth. 

»> fba_synl=FBA(syn) 
>>> print fba_synl.Z 
0.0895186102158 
>>> gro = fba_synl.Z 

>>> syn.constr [syn . dic_enzs ["_Growth"] ] = (0.95*gro, 0.95*gro) 
»> syn.obj = [("_H2" , "1")] 
»> fba_syn2 = FBA (syn) 

»> print »open("diff .txt'V'w") , f ba_synl-f ba_syn2 

In this series of commands we create the first FBA where the growth (reaction named ".Growth") of 
Synechocystis sp PCC. 6803 is optimized. We then use the metabolic model to create a second FBA where 
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the growth is fixed to 95% of its optimized value and then optimize the production of hydrogen (reaction 
named "_H2"). The last command will create a file called diff.txt (additional file 2) where one can see the 
comparison between this two states of the metabolism. This file shows four columns, the first one is the name 
of each reaction, in the second and third one can find the values for the fluxes in each FBA, respectively. The 
fourth column shows the absolute value of the relative change in percentage (100% Vl ~" 2 )• If the original 
flux was zero (v\ = 0) it will return "NA" in this column. By default it sorts the reaction by its difference 
value, so the first reactions listed on the file will have no difference in their flux and the reactions in the end 
of it will be the affected ones. One can clearly see that the most affected reactions are those related with 
the Synechocystis sp PCC. 6803 hydrogenase. 

Another straight forward method to analyze a FBA is the essential method, which checks if a reaction 
is essential for producing flux in the objective function. When called, it returns one Boolean value stating if 
the reaction is essential or not and a second value which informs of the relative change in the objective flux 
with the input reaction removed. 

»> print syn. enzymes [926] 

_H2C03transport : 1.0 H2C03_extrac <-> 1.0 H2CD3 
»> print fba_synl .essential (926) 
[False, 0.50000000000000111] 

This tells us that the transport of carbonic acid is not essential for the growth of the cell, but its removal 
reduces the growth by 50%. We can also count the total number of essential reactions in each model: 

»> print sum( [fba_cbe. essential (ii) [0] for ii in xrange(cbe .nreacs)] ) 
166 

>>> print sum( [fba_synl . essential (ii) [0] for ii in xrange (syn. nreacs) ] ) 
221 

»> fba_ak=FBA(ak) 

Warning: repeated index in Stoichiometric matrix. Look into reaction 746. FBA might not be correct. 

Warning: repeated index in Stoichiometric matrix. Look into reaction 748. FBA might not be correct. 

»> print ak. enzymes [746] 

R_MotexX : 1.0 M_Mo_e <-> 1.0 M_Mo_e 

»> print ak. enzymes [748] 

R_NatexX : 1.0 M_Na_e <-> 1.0 M_Na_e 
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»> print sum( [fba_ak. essential (ii) [0] for ii in xrange(ak.nreacs)] ) 
249 

So, the iCM925 model has 166 essential reactions, the iSyn811 221 and the iAK692 has 249. One also sees 
that the FBA class recognizes two problematic reactions in iAK692 model, in this case, where a metabolite 
is connected to itself by the reaction. 

The shadow and max_min methods work in a similar way. In each method one has to use as input 
an integer indicating a reaction number. The shadow method has two other optional inputs, the first one 
indicates the change in the original flux in order to calculate the derivative (the result should be independent 
of this choice, since the problem is linear) and the second indicates if one wishes the relative change or the 
absolute change. 

The method maxjnin has the f ixobj optional input. Its algorithm fixes the flux in the objective function 
to its original value times the value in f ixobj (this should always be a value between and 1). Then it sets 
the input reaction as objective and minimizes it, then maximizes it in its natural direction (S — > P) and 
then again minimizes it and maximizes it in the reversed direction (S P). It returns a two element list, 
each element is a tuple with the value of the flux in the reaction minimized and maximized in the direct 
direction and in the reversed direction, respectively. If the reaction is irreversible or there was no feasible 
solution for some optimization, it returns the string "X" . 

»> print fba_synl. shadow (926) 
0.5 

»> print fba_synl . shadow (926, relat=False) 
0.0263290030046 

>>> print f ba_synl .max_min(926,f ixobj =0 . 5) 
[(0.0, 671.92509141136259), (0.0, 0.0)] 
»> print f ba_synl .max_min(926,f ixobj =0 . 6) 
[(0.34000000000011554, 670.31010969363479), ('X', 'X')] 

As we saw before, the transport of carbonic acid can be removed at cost of reducing by a factor two the 
growth in the iSyn811 model. So, calculating its maximal and minimal flux if we fix the growth to half its 
maximum value, we find that we don't need the reaction (one is able to minimize it to zero). But, if we fix 
the growth to 60% of its maximal value, the flux in the transport of carbonic acid must be at least equal to 
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0.34 and in this case the reaction cannot occur in the reversed direction (which is indicated by the X's in 
the second list). 

All these examples do not intend to exhaust the uses of the PyNetMet tools and their functionality, but 
should be enough to illustrate their potentiality. 

4 Conclusions 

We have presented the PyNetMet package which contains four classes (Enzyme, Network, Metabolism and 
FBA) intended to facilitate the analysis, work, curation and construction of networks and metabolic models. 
These tools allow one to work with metabolic models in either standard (OptGene and SBML) and to easily 
convert one to another. The Metabolism class can be used as a platform to produce variants of any model 
(in silico mutants) by producing knock-ins or knock-outs with the add_reacs and pop methods, respectively 
and studying its effects straightforwardly with the class FBA. 

These tools are in the format of Python modules, which allow the researcher to integrate them with any 
other Python resource available. They are also open-source and free software which allows one to develop 
new tools using these as building blocks. 

This work also provides complementary examples (to the ones found in the manual) for uses of these 
tools with real published metabolic models. 

The authors present these tools in the hope that the scientific community will find them useful in their 
researches and might even extend and use them in yet new useful tools contributing even further the devel- 
opment in this field. 
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Figures 

4.1 Figure 1 - plot synl, plot syn2, plot syn3, plot cbel, plot cbe2, plot cbe3, plot akl, plot ak2 and 
plot ak3 




Plots for the topological overlap of metabolites. The first, second and third rows refer to the plots 
obtained by the three analyzed models, iSyn811, iCM925 and iak692, respectively. The plots in the first 
column are for an arbitrary ordering of the metabolites, in the second column an ordering is obtained via 
the Kruskal algorithm and in the third column the ordering is obtained by the algorithm implemented in 
the plot_nCCs method of the Network class. 
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Tables 

4.2 Table 1 - Attributes of the classes 



Olass 


Attribute 


Brief description 


Enzyme 


name 


Name of the reaction. 


Enzyme 


pathway 


The pathway to which the reaction belongs. 


Enzyme 


reversible 


Indicates if the reaction is reversible. 


Enzyme 


Nsubstratcs 


Number of substrates. 


Enzyme 


Nproducts 


Number of products. 


Enzyme 


metabolites 


List of all metabolites m the reactions. 


Enzyme 


substrates 


List of substrates. 


Enzyme 


products 


List of products. 


Enzyme 


issues 


Indicates possible issues in the reaction. 


Enzyme 


issucs_info 


List of issues. 


Enzyme 


stoic 


List of stoichiometric coefficients. 


Enzyme 


tup 


Tuple with the number of substrates and products. 


Network 


nnodes 


Number of nodes in the network. 


Network 


nodesnames 


List with the names of the nodes. 


Network 


directed 


Indicates it the network is directed or not. 


Network 


links 


T ' J. J? J. 1 J • J. • 11 l • J.1 1 

List of tuples, indicating all edges m the graph. 


Network 


nlinks 


J.1 1 C J * J_ J i_ * "J.1 J. 1 

total number of directed connections m the network. 


Network 


linksin 


List of edges coming from a given node. 


Network 


linksout 


List of edges going to a given node. 


Network 


ncigbs 


List of all nodes with connection to or from a given node. 


Network 


kis 


List with the degree of each node. 


Network 


Cis 


T - i_ • j_ 1 j_ 1 1 i_ - iT» * j_ r i l 

List with the clustering coefficients of each node. 


Network 


CCs 


TIT J. ' *J.l 1 ' J. J? '11 £ J. J 

Matrix with lists of common neighbors of two nodes. 


Network 


uCCs 


TIT J. ' *J.l 1 ' j_ £ 11 1 1 £ a J 

Matrix with lists of all neighbors of two nodes. 


Network 


nCCs 


Matrix with the topological overlap of two nodes. 


Network 


weight 


List with the average of each row of matrix nCCs. 


Network 


sd_wei 


T ' j_ • j_ 1 j_ 1 j_ l J J • j_ • £ 11 j. £ • 1 j. 

List with the standard deviation tor each clement of weight. 


Metabolism 


filename 


Name of input file (or model). 


Metabolism 


enzymes 


List of all reactions in the model (Enzyme objects). 


Metabolism 


dic_enzs 


A • j_ * l j. j. ' i ■ j_ ■ • j_ l i- i 

Association between reaction name and position m the lists. 


Metabolism 


nreacs 


Total number of reactions. 


Metabolism 


reacirr 


T " j_ £ -j.- £ j.1 ■ '"LI J. • 

List of positions of the irreversible reactions. 


Metabolism 


rcac_rev 


t - j r ' j_ - £ j_i 'i i j * 

List of positions of the reversible reactions. 


Metabolism 


nrcac_irr 


Number of irreversible reactions. 


Metabolism 


nreac_rev 


Number of reversible reactions. 


Metabolism 


mets 


List of all metabolites in the model. 


Metabolism 


dic.mets 


A *j_* £ j_1~1'j_ J 'i* 'j.1 l'j. 

Association or metabolite name and position m the lists. 


Metabolism 


nmcts 


Total number of metabolites. 


Metabolism 


pathnames 


List of comments found in the model file. 


Metabolism 


pathways 


List with lists of reactions per pathway. 


Metabolism 


reac_per_met 


T • J_ J? 1 ■ J_ C J_' 1 1 i 1 1 ■ j 

List of lists of reactions where each metabolite appears. 


Metabolism 


rcacs_per_met 


List with the number of reactions where each metabolite appears. 


Metabolism 


M 


The adjacency matrix for the metabolites. 


Metabolism 


net 


The metabolite's network (Network class object for the above adjacency matrix). 


Metabolism 


reactions 


Raw data in OptGcne format for the reactions. 


Metabolism 


transport 


List of all transport reactions. 


Metabolism 


external 


"PI J j. ' f~\ £ i £ j.1 j_ 1 j l i*i 

Haw data m UptCjcne format tor the external metabolites. 


Metabolism 


external_in 


List of metabolites that can come inside the cell from the outside. 


Metabolism 


cxtcrnaLout 


T"J_ £ J_1~1*J_ J.1 J.J.1 llJ. J_J_ i_l~ J_'J 

List of metabolites that the cell transports to the outside. 


Metabolism 


constrains 


Haw data m UptCjcne format tor the constraints. 


\Iot iibolism 


c r\ "n "1" v 


List with cons 1.. r t ii nl s i oi' o<\ ch i i c i ac t ion 


Metabolism 


objective 


Raw data in OptGcne format for the objective (for FBA optimization). 


Metabolism 


obj 


List for the objective function. 


FBA 


reacs 


List with the reactions (Enzyme objects). 


FBA 


nreacs 


Total number of reactions. 


FBA 


reac .names 


List with the names of all reactions. 


FBA 


mets 


List with all metabolites. 


FBA 


nmcts 


Total number of metabolites. 


FBA 


ext_in 


List of external metabolites that can enter the cell. 


FBA 


ext.out 


List of metabolites that can leave the cell. 


FBA 


Mstoic 


List of tuples with the non-zero elements of the stoichiometric matrix. 


FBA 


constr 


List of tuples with the constrains to be applied. 


FBA 


flux 


List with the fluxes for all reaction from the last optimization. 


FBA 


Z 


Value of the flux in the objective. 


FBA 


obj 


List of tuples for the objective. 


FBA 


eps 


Precision (value under which a flux is considered to be zero). 


FBA 


ip 


Linear problem (Pyglpk object). 
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4.3 Table 2 - Methods of the classes 



Class 


Method 


Brief description 


Enzyme 


connects 


Checks if two metabolites arc connected by the reaction. 


Enzyme 


copy 


Returns a copy of the reaction. 


Enzyme 


has_metabol 


Checks if a metabolite appears in the reaction. 


Enzyme 


has.product 


Checks if a metabolite is product of the reaction. 


Enzyme 


has_product_rev 


Checks if a metabolite is product (or substrate, if reversible) of the reaction. 


Enzyme 


has .substrate 


Checks if a metabolite is substrate of the reaction. 


Enzyme 


has_substrate_rev 


Checks if a metabolite is substrate (or product, if reversible) of the reaction. 


Enzyme 


makcjrr 


Returns the irreversible version of the reaction. 


Enzyme 


makc_rev 


Returns the reversible version of the reaction. 


Enzyme 


pop 


Removes a metabolite from the reaction. 


Enzyme 


rev_reac 


Returns the reaction in the reversed order (changes substrates and products). 


Enzyme 


stoicm 


Returns the stoichiometric coefficient of a metabolite. 


Network 


plot_nCCs 


Orders and plots the topological overlap for the nodes. 


Network 


plot_matr 


Plots a given matrix with a given ordering. 


Network 


kruskal 


Solves the Kruskal algorithm for a given matrix. 


Network 


calc_all_dists 


Calculates distances of all nodes to all others. 


Network 


calc_alLdists_wp 


Same as calc_alLdists, but returning also the paths. 


Network 


calc_dists 


Same as calc_all_dists but for a single node. 


Network 


calc_dists_wp 


Same as calc_dists, but returning also the paths. 


Network 


components 


Searches all disconnected components of the network. 


Metabolism 


calcs 


Calculates all attributes based m the enzymes list. 


Metabolism 


add_reacs 


Adds reactions to the metabolism. 


Metabolism 


pop 


Removes a single reaction from the metabolism. 


Metabolism 


dump 


Writes an output file with the model. 


Metabolism 


M_matrix 


Returns the adjacency matrix for the metabolites network. 


Metabolism 


M_matrix_reacs 


Returns the bipartite adjacency matrix of metabolites and reactions. 


Metabolism 


bad_reacs 


Removes reactions belonging to disconnected components of the network. 


Metabolism 


writeJog 


Write a output file with information on the model. 


FBA 


print_flux 


Returns the flux of a single reaction. 


FBA 


fba 


Prepares and performs the FBA. 


FBA 


shadow 


Calculates the derivative (sensibility) of a given reaction. 


FBA 


essential 


Tests whether a reaction is essential for producing flux in the objective. 


FBA 


max_min 


Returns the maximum and minimum flux of a reaction for a fixed objective value. 



Additional Files 

Additional file 1 - PyNetMet_manual.pdf 

Manual for the PyNetMet package. 



Additional file 2 - diff.txt 

Differences in the fluxes of the reactions of the iSyn811 model when optimizing growth or h2. 
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